80 research outputs found

    Geodesic distances in the intrinsic dimensionality estimation using packing numbers

    Get PDF
    Dimensionality reduction is a very important tool in data mining. An intrinsic dimensionality of a data set is a key parameter in many dimensionality reduction algorithms. When the intrinsic dimensionality of a data set is known, it is possible to reduce the dimensionality of the data without losing much information. To this end, it is reasonable to find out the intrinsic dimensionality of the data. In this paper, one of the global estimators of intrinsic dimensionality, the packing numbers estimator (PNE), is explored experimentally. We propose the modification of the PNE method that uses geodesic distances in order to improve the estimates of the intrinsic dimensionality by the PNE method

    Novel Machine learning approach for Self-Aware prediction based on the Contextual reasoning

    Get PDF
    Machine learning is compelling in solving various applied problems. Nevertheless, machine learning methods lack the contextual reasoning capabilities and cannot be fitted to utilize additional information about circumstances, environments, backgrounds, etc. Such information provides essential knowledge about possible reasons for particular actions. This knowledge could not be processed directly by either machine learning methods. This paper presents the context-aware machine learning approach for actor behavior contextual reasoning analysis and context-based prediction for threat assessment. Moreover, the proposed approach uses context-aware prediction to tackle the interaction between actors. An idea of the technique lies in the cooperative use of two classification methods when one way predicts an actor’s behavior. The second method discloses such predicted action (behavior) that is non-typical or unusual. Such integration of two-method allows the actor to make the self-awareness threat assessment based on relations between different actors where some multidimensional numerical data define the connections. This approach predicts the possible further situation and makes its threat assessment without any waiting for future actions. The suggested approach is based on the Decision Tree and Support Vector Method algorithm. Due to the complexity of context, marine traffic data was chosen to demonstrate the proposed approach capability. This technique could deal with the end-to-end approach for safe vessel navigation in maritime traffic with considerable ship congestion

    Visual decisions in the analysis of customers online shopping behavior

    Get PDF
    The analysis of the online customer shopping behavior is an important task nowadays, which allows maximizing the efficiency of advertising campaigns and increasing the return of investment for advertisers. The analysis results of online customer shopping behavior are usually reviewed and understood by a non-technical person; therefore the results must be displayed in the easiest possible way. The online shopping data is multidimensional and consists of both numerical and categorical data. In this paper, an approach has been proposed for the visual analysis of the online shopping data and their relevance. It integrates several multidimensional data visualization methods of different nature. The results of the visual analysis of numerical data are combined with the categorical data values. Based on the visualization results, the decisions on the advertising campaign could be taken in order to increase the return of investment and attract more customers to buy in the online e-shop

    Geodesic distances in the maximum likelihood estimator of intrinsic dimensionality

    Get PDF
    While analyzing multidimensional data, we often have to reduce their dimensionality so that to preserve as much information on the analyzed data set as possible. To this end, it is reasonable to find out the intrinsic dimensionality of the data. In this paper, two techniques for the intrinsic dimensionality are analyzed and compared, i.e., the maximum likelihood estimator (MLE) and ISOMAP method. We also propose the way how to get good estimates of the intrinsic dimensionality by the MLE method

    Tikimybinis dažnų posekių paieškos algoritmas

    Get PDF
    Dažnų posekių paieška didelėse duomenų bazėse yra svarbi biologinių, klimato, fi nansinių ir daugelio kitų duomenų bazių analizei. Tikslieji algoritmai, skirti dažnų posekių paieškai, daug kartų perrenka visą duomenų bazę. Jeigu duomenų bazė didelė, tai dažnų posekių paieška yra lėta arba reikalingi superkompiuteriai. Straipsnyje pasiūlytas naujas tikimybinis dažnų posekių paieškos algoritmas, kuris analizuoja tam tikru būdu sudarytą pradinės duomenų bazės atsitiktinę imtį. Remiantis šia analizedaromos statistinės išvados apie dažnus posekius pradinėje duomenų bazėje. Šis algoritmas nėra tikslus, tačiau veikia daug greičiau negu tikslieji algoritmai ir tinka žvalgomajai statistinei analizei. Tikimybinio algoritmo klaidų tikimybės įvertinamos statistiniais metodais. Tikimybinis algoritmas gali būti derinamas su tiksliaisiais dažnų posekių paieškos algoritmais. Jį galima taikyti ir bendrajam struktūrų paieškos uždaviniui.Probabilistic Algorithm for Mining Frequent SequencesJulija Pragarauskaitė, Gintautas Dzemyda SummaryFrequent sequence mining in large volume databases is important in many areas, e.g., biological, climate, fi nancial databases. Exact frequent sequence mining algorithms usually read the whole database many times, and if the database is large enough, then frequent sequence mining is very long or requires supercomputers. A new probabilistic algorithm for mining frequent sequences is proposed. It analyzes a random sample of the initial database. The algorithm makes decisions about the initial database according to the random sample analysis results and performs much faster than the exact mining algorithms. The probability of errors made by the probabilistic algorithm is estimated using statistical methods. The algorithm can be used together with the exact frequent sequence mining algorithms

    Specialios struktūros daugiasluoksnis perceptronas daugiamačiams duomenims vizualizuoti

    Get PDF
    Pasiūlytas ir ištirtas radialinių bazinių funkcijų ir daugiasluoksnio perceptrono junginys daugiamačiams duomenis vizualizuoti. Siūlomas vizualizavimo būdas apima daugiamačių duomenų matmenų mažinimą naudojant radialines bazines funkcijas, daugiamačių duomenų suskirstymą į klasterius, klasterį charakterizuojančių skaitinių reikšmių nustatymą ir daugiamačių duomenų vizualizavimą dirbtinio neuroninio tinklo paskutiniame paslėptajame sluoksnyje.Special Multilayer Perceptron for Multidimensional Data VisualizationLaura Ringienė, Gintautas Dzemyda SummaryIn this paper a special feed forward neural network, consisting of the radial basis function layer and a multilayer perceptron is presented. The multilayer perceptron has been proposed and investigated for multidimensional data visualization. The roposedvisualization approach includes data clustering, determining the parameters of the radial basis function and forming the data set to train the multilayer perceptron. The outputs of the last hidden layer are assigned as coordinates of the visualized points

    Minimization of the mapping error using coordinate descent

    Get PDF
    Visualization harnesses the perceptual capabilities of humans to provide the visual insight into data. Structure preserving projection methods can be used for multidimensional data visualization. The goal of this paper is to suggest and examine the projection error minimization strategies that would allow getting a better and less distorted projection. The classic algorithm for Sammon’s projection and two new its modifications are examined. All the algorithms are oriented to minimize the projection error because even a slight reduction in the projection error changes the distribution of points on a plane essentially. The conclusions are made on the results of experiments on artificial and real data sets

    Konferencijos „Lietuvos magistrantų informatikos ir IT tyrimai“ darbai

    Get PDF
    The conference "Lithuanian MSc Research in Informatics and ICT" is a venue to present research of Lithuanian MSc theses in informatics and ICT. The aim of the event is to raise skills of MSc and other students, familiarize themselves with the research of other students, encourage their interest in scientific activities. Students from Kaunas University of Technology and Vilnius University will give their presentations at the conference

    Konferencijos „Lietuvos magistrantų informatikos ir IT tyrimai“ darbai

    Get PDF
    The conference "Lithuanian MSc Research in Informatics and ICT" is a venue to present research of Lithuanian MSc theses in informatics and ICT. The aim of the event is to raise skills of MSc and other students, familiarize themselves with the research of other students, encourage their interest in scientific activities. Students from Kaunas University of Technology, Vilnius University, and Vytautas Magnus University will give their presentations at the conference

    Rekomendacinės sistemos algoritmų veikimo elektroninio knygyno duomenų bazėje analizė

    Get PDF
    Straipsnis skiriamas rekomendacinių sistemų algoritmų veikimo konkrečioje elektroninės parduotuvės duomenų bazėje analizei. Analizės tikslas – pagal pasirinktus įverčius rasti rekomendacinių sistemų algoritmus, efektyviausiai veikiančius turimoje duomenų bazėje. Šiame straipsnyje palyginti nemokamos rekomendacinių sistemų programinės įrangos paketai, aprašytas su pasirinkta programine įranga atliktas rekomendavimo algoritmų efektyvumo turimoje duomenų bazėje eksperimentinistyrimas siekiant nustatyti geriausiai ir prasčiausiai veikiančius algoritmus.Analysis of the effi ciency of recommendatory systems algorithms in an e-bookshop Aurimas Rapečka, Virginijus Marcinkevičius, Gintautas Dzemyda SummaryIn the paper, the effi ciency of various recommendatory systems algorithms in a data set of the local ebookshop is analysed. The key goal of analysis is to determine effective and not effective algorithms in the data set used for analysis. An analytical review of free or open source software of ecommendatory systems is presented. Some comparison criteria are selected. According to the criteria, a comparative analysis of the popular software of ecommendatory systems is made and some experiments with the best evaluated software are done. We have determined here which algorithms are effective in the data set, used for the experiments. 11pt; line-height: 115%; font-family: Calibri, sans-serif;">&nbsp
    corecore